Serving and consuming an HTTP multipart/mixed response in Python#33
Conversation
|
Could you please add a small (similar to https://github.com/apache/arrow-experiments/blob/main/http/get_simple/python/server/README.md) |
|
Can you use carets in the markdown footnotes (like this) so GitHub renders them as footnotes? Thanks |
Done. I was going to lookup the syntax after seeing the bad results. |
|
Thanks @felipecrv, this looks great! The only problem I see here is that the calls to |
|
I will merge later today if there are no other comments |
Yeah. It's the parsing logic. Passing the entire 1GB message blob to I called this |
The client parses the multipart response produced by server/server.py
by using the multipart message parser from the Python email module.
This module puts the entire message in memory and seems to spend a lot
of time looking for part delimiter and encoding/decoding the parts.
The overhead of multipart/mixed parsing is 85% on my machine and after
the ~1GB Arrow Stream message is fully in memory, it takes only 0.06%
of the total execution time to parse it.
$ python simple_client.py -- 3731 bytes of JSON data: [ {'ticker': 'SGJ', 'description': 'Syhnffek Gacb Jdylqis'} {'ticker': 'EILD', 'description': 'Eicfef Iiafeutm Lydut Dbmgq'} {'ticker': 'QTO', 'description': 'Qclxkqjd Tkxan Odmac'} {'ticker': 'IHTS', 'description': 'Iowjy Hieuj Tvwecy Smxedh'} {'ticker': 'TGFJ', 'description': 'Tvztlhba Garebomj Fnwvwgf Jffldbg'} ...+55 entries... ] -- 988931832 bytes of Arrow data: Schema: ticker: string price: int64 volume: int64 Parsed 42000000 records in 6836 batch(es) -- Text Message: Hello Client, 6836 Arrow batch(es) were sent in 6.561 seconds through 6837 HTTP response chunks. Average size of each chunk was 144644.13 bytes. -- Sincerely, The Server -- End of Text Message -- 13.645 seconds elapsed 11.833 seconds (86.72%) seconds parsing multipart/mixed response 0.011 seconds (0.08%) seconds parsing Arrow streamCloses apache/arrow#40598